3D Pose-by-Detection of Vehicles via Discriminatively Reduced Ensembles of Correlation Filters: Supplementary Material
نویسندگان
چکیده
In this supplementary material we show a full derivation of our Reduced Ensemble of Correlation Filters and provide more quantitative results that are not shown in the paper due to space constraints. 1 Ensemble of Exemplar Classifiers for Pose-by-Detection 1.1 Exemplar Correlation Filters Exemplar classifiers are suited to the task of pose-by-detection. For each one of the V viewpoint renders we train an Exemplar Correlation Filter (ECF) using the rendered image as the single positive, and N− 1 image patches selected randomly from a background set of images that do not contain the object instance. Each ECF is trained to detect the object from a specific viewpoint. Let {xi}i=1 be a set of Histogram of Oriented Gradients (HOG) representations of the training examples, consisting of one positive exemplar rendering of the v-th view and N−1 negative bounding boxes. Also, define { gv , · · · ,gv } as the ECF for a viewpoint v, where C is the number of channels of the HOG feature representation (commonly 32). The response of an image xi to the filter is defined as C ∑ c=1 xi ⊗gv = Correlation Output, (1) where ⊗ denotes the 2D convolution operator. The ECF design is posed as: min gv ,··· ,gv N ∑ i=1 ∥∥∥∥ C ∑ c=1 xi ⊗gv− ri ∥∥∥∥ 2 2 +λ C ∑ c=1 ‖gv‖ 2 2 , (2) c © 2014. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms. 2 MOVSHOVITZ-ATTIAS, BODDETI, WEI, SHEIKH: 3D POSE-BY-DETECTION where ri is the matrix holding the desired correlation output of the i-th training image, and λ moderates the degree of regularization. The desired correlation output ri is set to a positively scaled Gaussian for the positive exemplar and to a negatively scaled Gaussian for the negative patches. This choice of the desired output correlation shape also implicitly calibrates the different exemplar classifiers. The minimization problem can be equivalently posed in the frequency domain to derive a closed form expression, which in turn lends itself to an efficient solution [1]. It should be noted that, as a complete set, each view v ∈ V is trained independently, and that increase in the desired precision d increases the size of the ensemble (linearly for one axis of rotation, quadratically for two, and cubically for all three). Figure 1 (A) shows the training configuration for one exemplar correlation filter. For visualization clarity we do not show negative images. 1.2 Discriminative Reduction of Ensembles of Correlation Filters The procedure described in Section 1 produces a large set of exemplar classifiers, one per view that needs to be resolved. Let G ∈ RD×V be the matrix of all V filters arranged as column vectors, where D is the dimensionality of the feature. This set is an exhaustive representation of the object’s appearance from many views, but applying all the filters during test time is computationally expensive. It is also highly redundant as many views of the object are similar in appearance. Our reduced Ensemble of Exemplar Correlation Filter (EECF) approach is designed to jointly learn a set of K exemplar correlation filters F = [f1, . . . , fK ] (each with C channels) and a set of V sparse coefficient vectors A = [α1, . . . ,αV ] such that a detector gv for any viewpoint v of the object is defined by
منابع مشابه
3D Pose-by-Detection of Vehicles via Discriminatively Reduced Ensembles of Correlation Filters
Estimating the precise pose of a 3D model in an image is challenging; explicitly identifying correspondences is difficult, particularly at smaller scales and in the presence of occlusion. Exemplar classifiers have demonstrated the potential of detection-based approaches to problems where precision is required. In particular, correlation filters explicitly suppress classifier response caused by ...
متن کامل3D Gabor Based Hyperspectral Anomaly Detection
Hyperspectral anomaly detection is one of the main challenging topics in both military and civilian fields. The spectral information contained in a hyperspectral cube provides a high ability for anomaly detection. In addition, the costly spatial information of adjacent pixels such as texture can also improve the discrimination between anomalous targets and background. Most studies miss the wort...
متن کاملSynthetic 3D Model-Based Object Class Detection and Pose Estimation. (Détection de Classes d'Objets et Estimation de leurs Poses à partir de Modèles 3D Synthétiques)
The present thesis describes 3D model-based approaches to object class detection and pose estimation on single 2D images. We introduce learning, detection and estimation steps adapted to the use of synthetically rendered training data with known 3D geometry. Most existing approaches recognize object classes for a particular viewpoint or combine classifiers for a few discrete views. By using CAD...
متن کاملGPS-LiDAR Sensor Fusion Aided by 3D City Models for UAVs
Outdoor positioning for Unmanned Aerial Vehicles (UAVs) commonly relies on GPS signals, which might be reflected or blocked in urban areas. In such cases, additional on-board sensors such as Light Detection and Ranging (LiDAR) are desirable. To fuse GPS and LiDAR measurements, it is important, yet challenging, to accurately characterize the error covariance of the sensor measurements. In this p...
متن کاملObject Detection and Segmentation using Discriminative Learning
Jingdan Zhang: Object Detection and Segmentation using Discriminative Learning. (Under the direction of Leonard McMillan.) Object detection and segmentation algorithms need to use prior knowledge of objects’ shape and appearance to guide solutions to correct ones. A promising way of obtaining prior knowledge is to learn it directly from expert annotations by using machine learning techniques. P...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014